After pulling the data from CPD data warehouse, the data has a format that looks like:

##      DATEOCC YEAR MONTH DAY DOW CURR_IUCR FBI_CD AREA BEAT DISTRICT
## 1 2008-01-01 2008     1   1 Tue      0320     03    2  631        6
## 2 2008-01-01 2008     1   1 Tue      0265     02    5 1412       14
## 3 2008-01-01 2008     1   1 Tue      1754     02    1  725        7
##   X_COORD Y_COORD LOCATION INC_CNT
## 1 1183288 1850874      304       1
## 2 1152781 1918361      090       1
## 3 1167145 1859291      290       1

A preview of the variables

## 'data.frame':    693175 obs. of  14 variables:
##  $ DATEOCC  : Date, format: "2008-01-01" "2008-01-01" ...
##  $ YEAR     : int  2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ...
##  $ MONTH    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DAY      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DOW      : Factor w/ 7 levels "Fri","Mon","Sat",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ CURR_IUCR: Factor w/ 75 levels "0110","0130",..: 20 7 75 14 75 74 75 8 8 75 ...
##  $ FBI_CD   : Factor w/ 7 levels "01A","02","03",..: 3 2 2 2 2 2 2 2 2 2 ...
##  $ AREA     : Factor w/ 6 levels "0","1","2","3",..: 3 6 2 2 3 3 5 5 3 6 ...
##  $ BEAT     : Factor w/ 305 levels "111","112","113",..: 71 172 83 101 269 54 138 169 67 212 ...
##  $ DISTRICT : Factor w/ 26 levels "1","2","3","4",..: 6 14 7 8 22 5 11 13 6 17 ...
##  $ X_COORD  : int  1183288 1152781 1167145 1160916 1173738 1182361 1150832 1162552 1171606 1148547 ...
##  $ Y_COORD  : int  1850874 1918361 1859291 1859682 1836987 1838427 1899022 1900718 1853535 1929621 ...
##  $ LOCATION : Factor w/ 99 levels "","090","092",..: 87 2 79 79 79 79 79 79 79 79 ...
##  $ INC_CNT  : int  1 1 1 1 1 1 1 1 1 1 ...

A summary of the data

##     DATEOCC                YEAR          MONTH             DAY       
##  Min.   :2008-01-01   Min.   :2008   Min.   : 1.000   Min.   : 1.00  
##  1st Qu.:2009-06-20   1st Qu.:2009   1st Qu.: 4.000   1st Qu.: 8.00  
##  Median :2011-02-19   Median :2011   Median : 7.000   Median :16.00  
##  Mean   :2011-03-25   Mean   :2011   Mean   : 6.513   Mean   :15.66  
##  3rd Qu.:2012-11-20   3rd Qu.:2012   3rd Qu.: 9.000   3rd Qu.:23.00  
##  Max.   :2014-12-31   Max.   :2014   Max.   :12.000   Max.   :31.00  
##                                                                      
##   DOW           CURR_IUCR      FBI_CD         AREA             BEAT       
##  Fri: 98816   0486   :210289   01A:  3036   0   :     3   421    :  6896  
##  Mon: 94869   0460   :148648   02 : 12482   1   :199558   624    :  6674  
##  Sat:105105   0560   : 99396   03 : 99219   2   :201478   423    :  6650  
##  Sun:108317   0320   : 36399   04A: 36279   3   :130101   511    :  5567  
##  Thu: 95168   031A   : 35239   04B: 60408   4   : 77124   612    :  5450  
##  Tue: 94597   0430   : 20262   08A:109131   5   : 84889   (Other):661920  
##  Wed: 96303   (Other):142942   08B:372620   NA's:    22   NA's   :    18  
##     DISTRICT         X_COORD           Y_COORD           LOCATION     
##  7      : 52718   Min.   :1094469   Min.   :1813932   303    :136428  
##  11     : 47998   1st Qu.:1152995   1st Qu.:1856702   090    :125008  
##  6      : 47268   Median :1166410   Median :1878481   304    :117738  
##  4      : 46469   Mean   :1165222   Mean   :1881764   290    :110815  
##  8      : 45469   3rd Qu.:1177026   3rd Qu.:1906159   314    : 25288  
##  (Other):453235   Max.   :1205097   Max.   :1951533   092    : 20512  
##  NA's   :    18                                       (Other):157386  
##     INC_CNT 
##  Min.   :1  
##  1st Qu.:1  
##  Median :1  
##  Mean   :1  
##  3rd Qu.:1  
##  Max.   :1  
## 

The summary of how the crime counts are distributed in each area

## 
##      0      1      2      3      4      5   <NA> 
##      3 199558 201478 130101  77124  84889     22

and in each district

## 
##     1     2     3     4     5     6     7     8     9    10    11    12 
## 13929 25707 42677 46469 38309 47268 52718 45469 34477 34964 47998 18116 
##    13    14    15    16    17    18    19    20    21    22    23    24 
## 10239 22235 34333 17396 17217 18321 14324 10541  8421 23413  7202 20309 
##    25    31  <NA> 
## 41098     7    18

What need to be noticed are (a) District 31 only has 8 incidents, and (b) Area 0 only has 3 incidents during the 7 year period.

Most of the missing values (appearing in attribute AREA,DISTRICT and BEAT) have identical row indices.

From the shape files provided by CPD, the area, district and beat polygon maps are shown below

## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "area_bndy"
## with 8 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "district_bndy"
## with 28 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/xiaomuliu/CrimeProject/SpatioTemporalModeling/CPDShapeFiles/", layer: "beat_bndy"
## with 288 features and 3 fields
## Feature type: wkbPolygon with 2 dimensions

A scatter point plot of violent crime locations for a certain day (2014-01-01)

Let’s first aggregate data by policing beat/district to see if there is, if any, spatial and temporl pattern at beat/district level. Both of the plots below try to unveil if different districts have similar seasonal patterns or not.

The top plot shows the daily crime time series. Note that the series of district 13, 21, and 23 seem to be truncated. It turned out that data of distirct 13, 21 and 13 is only avaiable up to 2012/12/16, 2013/03/02, and 2013/03/01 respectively. For the bottom plot, the crime counts were first grouped by year and then aggregated by district and month. Interestingly, seasonal patterns do vary in different districts.

Grouping by beat would present higher resolution view of spatial and temporal patterns. However, as we have nearly 300 beats, instead of using muit-panel plots, we resorted to heap map to show these patterns.

Again, some beats have strong decreaseing periodic seasonal trend while some others don’t. And the crime counts in adjacent beats are usually close.

Now let’s move from regional analysis to city-wide analysis. Here is a incident location plot of year 2014.

It is difficult to examine if crime location clusters are time-varying just by looking at the point plots. Let’s move to grid(pixel)-based analysis. First, the point data was rasterized through binning into a 100 \(\times\) 100 grid (the boundaries were defined by the range of x-coordinate and y-coordinate from all available crime locations plus a margin of 1000 unit on each side). Here shows an example of pixelized violent crime locations in January 2014.

Next, we do kernel density estimation (KDE) of the monthly aggregation over each year. The kernel applied here is a 2D Gaussian kernel with the same bandwidth in each direction. The bandwidth was selected through (minimizing MSE) cross-valiation using all available data (08-14). The figure below shows the KDE for each month for year 2014.

Here displays an animation of KDE for each year (08-14). It does not show there exists obvious crime hot spot migration throughout all the years being studied.

KDE animation